fix(security): eliminate API key leakage via ps aux across all three execution layers#1
Open
fix(security): eliminate API key leakage via ps aux across all three execution layers#1
Conversation
2afa1f4 to
47512fa
Compare
Add three argv-safe helpers to bin/lib/runner.js:
runArgv(prog, args, opts) -- spawnSync without shell
runCaptureArgv(prog, args, opts) -- execFileSync without shell; returns stdout
assertSafeName(name, label) -- validates against [a-zA-Z0-9][a-zA-Z0-9_-]{0,62}
Fix pre-existing opts.env overwrite: old spread { ...opts } after the merged env
silently clobbered it. mergeEnv(opts) destructures opts.env first.
test/runner.test.js: 22 new assertions (assertSafeName rejections, injection
PoC, opts.env preservation).
Closes shell-injection attack surface in the legacy CJS layer by replacing
all user-controlled run() / runCapture() shell strings with the new argv-safe
runArgv() / runCaptureArgv() helpers. assertSafeName() guards every
user-supplied sandbox/instance/preset name before it enters any command.
bin/lib/onboard.js -- all openshell/bash/brew calls -> runArgv;
file copies -> fs.cpSync/fs.rmSync (no cp shell)
bin/lib/nim.js -- docker pull/rm/run/stop/inspect -> runArgv/runCaptureArgv;
assertSafeName guard on sandboxName
bin/lib/policies.js -- openshell policy get/set -> runCaptureArgv/runArgv;
assertSafeName on sandboxName and presetName;
temp policy file written with mode 0o600
bin/nemoclaw.js -- setupSpark: remove inline NVIDIA_API_KEY=VALUE from
sudo argv (sudo -E already inherits env);
deploy: assertSafeName on instanceName;
sandbox connect/status/logs/destroy -> runArgv
Supersedes PRs: NVIDIA#148 (shell injection), part of NVIDIA#330 (credential leak).
… in argv Fixes: NVIDIA#325 (API key exposed in process list via ps aux) Supersedes: PRs NVIDIA#191, NVIDIA#330 The root cause: all three execution layers passed the actual credential VALUE as --credential KEY=VALUE, making it visible to any local user via `ps aux` or /proc/<pid>/cmdline. Safe pattern: set the secret in the child's inherited env, then pass only the env-var NAME to --credential (openshell env-lookup form). nemoclaw/src/commands/onboard.ts - process.env[credentialEnv] = apiKey before execOpenShell - --credential arg: credentialEnv (name only, not KEY=VALUE) - applies to both provider create and provider update paths nemoclaw-blueprint/orchestrator/runner.py - Rename credential_env -> target_cred_env with type-based fallback (nvidia -> NVIDIA_API_KEY, openai -> OPENAI_API_KEY) when not set in the blueprint profile. Supersedes PR NVIDIA#191's partial fix. - os.environ[target_cred_env] = credential before run_cmd - --credential arg: target_cred_env (name only) nemoclaw-blueprint/blueprint.yaml - Add credential_env: NVIDIA_API_KEY to the default profile. Without this field the type-based fallback would silently use OPENAI_API_KEY for the nvidia provider_type, causing auth failure. nemoclaw/src/onboard/config.ts - writeFileSync for config.json now passes mode: 0o600 so the file containing endpoint/model/credentialEnv metadata is not world-readable. test/credential-exposure.test.js (new file) - Static source scan: asserts no --credential KEY=VALUE pattern in any of the 3 execution layer files (allowlists dummy/ollama stubs) - Layer-specific structural checks (process.env set, os.environ set, blueprint default profile has credential_env) - Runtime injection PoC: proves old bash -c IS vulnerable; new runCaptureArgv IS NOT All 84 tests pass.
47512fa to
892bc25
Compare
cv
pushed a commit
that referenced
this pull request
Apr 10, 2026
…IA#1114) (NVIDIA#1305) ## Summary Fixes the four issues reported in NVIDIA#1114 — EACCES permission errors and missing gateway token when running inside the NemoClaw sandbox. ### Issue mapping | # | Reported error | Fix | |---|----------------|-----| | 1 | `EACCES: open '/sandbox/.openclaw/openclaw.json.*.tmp'` | `install_configure_guard` — intercepts `openclaw configure` with a clear error and directs users to `nemoclaw onboard --resume` on the host | | 2 | Same as #1 (different PID) | Same fix | | 3 | `EACCES: mkdir '/sandbox/.openclaw/credentials'` | Already resolved on main via NVIDIA#1519 (credentials symlink to `.openclaw-data/`) | | 4 | No WhatsApp QR code | Consequence of NVIDIA#3, also resolved by NVIDIA#1519 | ### Root cause (issues 1 & 2) OpenClaw's `configure` command performs atomic writes — it creates a temp file (`openclaw.json.PID.UUID.tmp`) in the same directory as the config. Since `/sandbox/.openclaw/` is Landlock read-only at the kernel level, file creation is rejected with EACCES. This is by design: the sandbox config is intentionally immutable at runtime. Rather than weakening Landlock (security regression), we intercept the command in the sandbox shell and guide users to the correct host-side workflow. ### Changes **1. `install_configure_guard()`** — Writes a shell function wrapper to `.bashrc`/`.profile` that intercepts `openclaw configure` and prints: ``` Error: 'openclaw configure' cannot modify config inside the sandbox. The sandbox config is read-only (Landlock enforced) for security. To change your configuration, exit the sandbox and run: nemoclaw onboard --resume This rebuilds the sandbox with your updated settings. ``` All other `openclaw` subcommands pass through to the real binary. **2. `export_gateway_token()`** — Reads `gateway.auth.token` from `openclaw.json` and exports it as `OPENCLAW_GATEWAY_TOKEN`, so interactive sessions (`openshell sandbox connect`) can authenticate with the gateway. Persists to `.bashrc`/`.profile` using idempotent marker blocks and cleans stale tokens on revocation. **3. `_read_gateway_token()` helper** — Shared Python snippet used by both `export_gateway_token` and `print_dashboard_urls` (deduplication, uses `with open()` context manager). All three are called in both root and non-root startup paths. ## Security properties preserved - `/sandbox/.openclaw` remains root-owned, Landlock read-only - `openclaw.json` remains chmod 444 (immutable) - No new attack surface — token is read-only from existing config - `command openclaw` bypass preserves all non-configure functionality Fixes NVIDIA#1114 Signed-off-by: Dongni Yang <[email protected]> Co-authored-by: Claude Sonnet 4.6 <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> --------- Signed-off-by: Dongni Yang <[email protected]> Co-authored-by: Claude Sonnet 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes NVIDIA#325 — API key exposed in process list via
ps aux.This PR is a comprehensive rollup that supersedes open PRs NVIDIA#148, NVIDIA#191, NVIDIA#225, and NVIDIA#330. It fixes every instance of secret leakage and shell injection that was identified during a first-principles audit of all three execution layers.
Root Cause
openshell provider create --credential KEY=VALUEpasses the secret as a command-line argument. Every user on the machine can read it viaps auxor/proc/<pid>/cmdline. The same pattern existed in:bin/lib/onboard.jsrun(\openshell provider create --credential "NVIDIA_API_KEY=${process.env.NVIDIA_API_KEY}"`)`nemoclaw/src/commands/onboard.tsexecOpenShell(["--credential", \${credentialEnv}=${apiKey}`])`nemoclaw-blueprint/orchestrator/runner.pyprovider_args.extend(["--credential", f"OPENAI_API_KEY={credential}"])Fix — Three Commits
Commit 1 —
fix(runner): safe argv primitives + opts.env overwrite fixrunArgv(prog, args, opts)—spawnSyncwithout shell; no metacharacter expansionrunCaptureArgv(prog, args, opts)—execFileSyncwithout shell; returns stdout stringassertSafeName(name, label)— validates user-supplied names against[a-zA-Z0-9][a-zA-Z0-9_-]{0,62}; callsprocess.exit(1)on rejectionopts.envoverwrite bug:mergeEnv()destructuresopts.envbefore the rest spread soPATH/HOME/DOCKER_HOSTare always preservedCommit 2 —
fix(cli): shell-string → argv arrays (injection prevention)Converts every
run(\...`)` call that accepted user-controlled values across:bin/lib/onboard.js— all openshell/bash/brew calls,fs.cpSync/fs.rmSyncreplacing shellcp/rmbin/lib/nim.js— docker pull/rm/run/stop/inspect +assertSafeNameon sandboxNamebin/lib/policies.js— openshell policy get/set +assertSafeNameon both sandboxName and presetName + temp file written withmode: 0o600bin/nemoclaw.js— removes inlineNVIDIA_API_KEY=VALUEfromsudoargv (superseded bysudo -Eenv inherit);assertSafeNameon deploy instanceName; sandbox connect/status/logs/destroy →runArgvCommit 3 —
fix(credentials): env-lookup form for--credential; secrets never in argvSafe pattern: set the secret in
process.env/os.environbefore the call, then pass only the env-var name to--credential— openshell reads the value from the environment, never from argv.nemoclaw/src/commands/onboard.tsprocess.env[credentialEnv] = apiKeybeforeexecOpenShell; bothprovider createandupdatepaths changednemoclaw-blueprint/orchestrator/runner.pytarget_cred_envwith type-based fallback (supersedes NVIDIA#191);os.environ[target_cred_env] = credential;--credential target_cred_envnemoclaw-blueprint/blueprint.yamlcredential_env: NVIDIA_API_KEYtodefaultprofile — without it the fallback would pickOPENAI_API_KEYfor thenvidiaprovider typenemoclaw/src/onboard/config.tswriteFileSyncuses{ mode: 0o600 }so config.json is owner-readable onlyVerification
Key tests added:
test/runner.test.js(22 assertions) — assertSafeName rejects;,$(),|,../, spaces;runCaptureArgvdoes not expand shell metacharacters;opts.envpreservesPATHtest/credential-exposure.test.js(9 assertions) — static scan of all 3 layers for--credential KEY=VALUEpatterns; structural checks forprocess.env[credentialEnv]andos.environ[target_cred_env]; runtime injection PoCps aux before/after
Before (vulnerable):
After (safe):
The secret is in the process's environment variables, not in its argv.
PRs Superseded
isNonInteractive()inonboard.tsalready implements this--credentialarg